Introduction: This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

1 Exploring the Data

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"

2 Univariate Plots Section

2.1 Going to start by exploring the Borrower data.

Which state has the most borrowers? California exceeds the rest of the states with 14,717 loans.

What is the Employment Status of Borrowers? We see that most borrowers are employed. (I did notice that there is ‘Employed’ then there is ‘Full-Time’ and ‘Part-Time’.)

Are most borrowers homeowners? This is pretty even except there are more borrowers that are homeowners.

What is the most common income range amoungst borrowers? Most incomes are within the range of ‘$25,000 - $74,999’ I did noticed that these categories can be organized better.

What is the most common Prosper Alpha rating amoungst borrowers? We see that most borrowers did not have a rating. However, we see ‘C’ is the most popular rating for those who have been rated

We see that most of the borrowers have a low debt to income ratio.

2.2 Next I am going to look at Loan Data

## geom_bar: na.rm = FALSE
## stat_bin: binwidth = NULL, bins = 1000, center = NULL, boundary = NULL, breaks = NULL, closed = c("right", "left"), pad = FALSE, na.rm = FALSE
## position_stack

What is the most popular loan amount? We see that most loans are under $15,000. With the majority of loans are $4000.

What are the terms of the loans? Most loans are 36 months.

What is the count of each loan status? Most loans are in the ‘current’ status.

How many loans issued per year? We see that 2013 had the most loans. Noticed a large dip in 2009. (Wonder why the dip in 2019?)

3 Univariate Analysis

3.0.1 What is the structure of your dataset?

This is a large dataset which consists of 81 variables, with 113937 observations. This Dataset is about Prosper Loans from 2005 through 2014.

3.0.2 What is/are the main feature(s) of interest in your dataset?

I broke down this dataset in two main parts. Borrowers and Loans.

Borrowers - Most borrowers are from California and made betweend $25k and 75k. Most borrowers are employed. It was almost and even split whether borrowers owned their own home or not. Most borrowers had a good debt to Income ratio.

Loans - Most loans were under $15,00o and at 36 month term. Most loans were in good standing. Between 2005 and 2014, 2013 had the most loans issued. There was also a big decline in loans in 2009.

3.0.3 What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

I will like to dig a little deeper in ‘Prosper Rating’.

3.0.4 Did you create any new variables from existing variables in the dataset?

I created ‘LoanOriginationYear’ out of ‘LoanOriginationDate’.

3.0.5 Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

I noticed that withing the ‘Employment status’ that there were the following categories, ‘Employed’, ‘Full Time’, and ‘Part Time’. These three seem to be the same and probably can be combined.

I also noticed that in the Prosper Rating that there were a lot of nulls.

4 Bivariate Plots Section

Continuing to notice the correlation between Borrower Rate and Credit Score.

We can see that loan amounts decrease as the Prosper Rating worsens.

As expected, we see that the borrower Rate increases as the Prosper Rating Worsens.

As expected, we see that the borrower Rate decreases as the Credit Score improves.

We see a small increase in the prosper rating as the Monthly Income Increases.

As expected a higher credit score tends to have a higher Prosper rating

Loans Amounts have been increasing after 2009, but 2013 and 2014 has leveled out.

As expected borrowers who are employed qualify for larger loan amounts.

5 Bivariate Analysis

5.0.1 Talk about some of the relationships you observed in this part of the investigation.

5.0.2 How did the feature(s) of interest vary with other features in the dataset?

I decided to explore the Prosper Ratings and try to find the relations that might influence the Prosper Rating.

I noticed that as the Propser Rating worsened the Loan Amounts decreased, the Borrower Rate increased. I did not find much of a relationship between monthly income and the prosper rating.

I also noticed that the borrower rate is correlated with the CreditScore. The higher the credit score the lower the interest rate the borrower gets.

5.0.3 What was the strongest relationship you found?

The strongest relationship I found between the Borrower Rate and the Credit Score and also the Borrower Rate and the Propser Rating.

6 Multivariate Plots Section

Continues to show the relationship between these 3 variables.

First thing I notice is that only borrowers with top Prosper Ratings will get qualified for a loan larger that 25k.

Monthly income of the borrower doesn’t seem to have much of a relation with the rating.

Continue to see that Prosper rating is a good predictor for the borrower rate.

The heat map shows that highly rated “Not employed” borrowers have to pay slight higher rates.

7 Multivariate Analysis

7.0.1 Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

I continue to notice how the borrower rate, Credit Score and Porsper Ratings have a strong relationship. As the Prosper Rating and Credit Score improves the borrower rate decreases.

7.0.2 Were there any interesting or surprising interactions between features?

One surprising thing that I noticed is that only borrowers with top Prosper Ratings will get qualified for a loan larger that 25k.


8 Final Plots and Summary

8.0.1 Plot One

8.0.2 Description One

This bar graph grabbed my attention from the very beginning. I saw a sudden decrease in 2009 and was quickly curious to what caused that large drop. After researching the internet, I came to understand the period of October 15, 2008 to July 13, 2009 is Prosper’s QuietSEC Period, from which they are suspended for lending activities upon SEC approval. Prosper relaunched in July 2009. (https://en.wikipedia.org/wiki/Prosper_Marketplace#2009_post_SEC_relaunch)

8.0.3 Plot Two

## NULL

8.0.4 Description Two

As expected, we see that the borrower Rate increases as the Prosper Rating Worsens. Borrowers who do not have good history have worse Prosper Ratings, therefore, are higher risk and will have a higher borrower rate.

8.0.5 Plot Three

8.0.6 Description Three

For the most part, borrowers with top Prosper Ratings will get qualified for a loan larger that 25k. Same as above, borrowers with worse Prosper Ratings are more of a risk. Prosper Loans must limit the amount of the loan for ‘risky’ borrrowers.


9 Reflection

This was a large dataset about Loans that I do not have much personal experience. I just started exploring this dataset randomly at first to get a feel for this dataset. My first explorations I broke the dataset down into two groupings, borrowers and loans. I continued the exploratory data analysis and I decided to focus more on Prosper Ratings and what kind of relationships were attached to that rating. I then continued to provide examples of how Borrower rate, Prosper Rating and Credit SCore had a strong relationship.

There is so much more that can be analyzed in this data set, and so many exploratory paths you could go down. One idea for more exploration I have is to learn more about delinquent loans and all of the relationships with that scenario.